It’s less common in Psychology to use count variables as outcomes, but they’re extremely useful. They have an interpretable metric, and they’re usually in units that are meaningful (and possibly important).
This family of distributions with maximum entropy that matches the expectations of a count variable – non-negative, integers, with no maximum (different from binomial) – is the poisson family. These distributions are defined by a single parameter \((\lambda)\).
As a reminder, the mean and the variance of the Poisson distribution is equal to \(\lambda\); use that knowledge to interpret your estimates and set your priors.
The conventional link function for the poisson is the log link, which ensures that \(\lambda\) is always positive.
visualizing priors
Our brms code will use a log-link function to estimate \(\lambda\) from a linear model:
log(lambda) = a + b*var
Your priors for a and b will likely follow a typical Gaussian distribution.
a ~ Normal(0,1)
But the transformation is an exponential one, meaning that the relationship between your linear coefficients and resulting \(\lambda\) are difficult to predict.
culture population contact total_tools mean_TU
1 Malekula 1100 low 13 3.2
2 Tikopia 1500 low 22 4.7
3 Santa Cruz 3600 low 24 4.0
4 Yap 4791 high 43 5.0
5 Lau Fiji 7400 high 33 5.0
6 Trobriand 8000 high 19 4.0
7 Chuuk 9200 high 40 3.8
8 Manus 13000 low 28 6.6
9 Tonga 17500 high 55 5.4
10 Hawaii 275000 low 71 6.6
We’ll model the idea that:
The number of tools increases with the log population size. Why log? Because that’s what the theory says: that it is the order of magnitude of the population that matters, not the absolute size of it.
The number of tools increases with the contact rate among islands. No nation is an island, even when it is an island. Islands that are better networked may acquire or sustain more tool types.
The impact of population on tool counts is moderated by high contact. This is to say that the association between total_tools and log population depends upon contact.
m2 <-brm(data = Kline, family = poisson,bf(total_tools ~ a + b*P, a + b ~0+ contact,nl =TRUE), prior =c( prior( normal(3, 0.5), nlpar = a),prior( normal(0, 0.2), nlpar = b)),iter =2000, warmup =1000, chains =4, cores =4, seed =11,file =here("files/models/61.2"))
m1
Family: poisson
Links: mu = log
Formula: total_tools ~ 1
Data: Kline (Number of observations: 10)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 3.54 0.05 3.44 3.64 1.00 1716 1970
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
m2
Family: poisson
Links: mu = log
Formula: total_tools ~ a + b * P
a ~ 0 + contact
b ~ 0 + contact
Data: Kline (Number of observations: 10)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
a_contacthigh 2.84 0.47 1.92 3.74 1.00 1378 1405
a_contactlow 1.69 0.29 1.13 2.25 1.00 1439 1323
b_contacthigh 0.09 0.05 -0.01 0.19 1.00 1399 1489
b_contactlow 0.19 0.03 0.14 0.25 1.00 1444 1295
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
What do these mean?
Once we’ve moved outside of the Gaussian distribution, your best bet is to push everything back through the posterior. Do NOT try and evaluate the model estimates.
The binomial distribution is appropriate when you only have two outcomes, but what if you have multiple unordered outcomes? In that case, you should use the multinomial distribution, of which the binomial is just a special case.If there are \(K\) types of events with probabilities \(p_1,...,p_K\), then the probability of observing \(y_1,...,y_K\) events of each type out of \(n\) total trials is:
\[
\text{Pr}(y_i,...,y_K| n, p_i,...,p_K) = \frac{n!}{\prod_i y_i!}\prod_{i=1}^Kp_i^{y_i}
\]
The link function used with the multinomial is the multinomial logit, which is also called the softmax function:
The biggest issue is what to do with the multiple linear models. In a binomial GLM, you can pick either of the two possible events and build a single linear model for its log odds. The other event is handled automatically. But in a multinomial GLM, you need \(K − 1\) linear models for K types of events. One of the outcome values is chosen as a “pivot” and the others are modeled relative to it. In each of the \(K − 1\) linear models, you can use any predictors and parameters you like—they don’t have to be the same, and there are often good reasons for them to be different. In the special case of two types of events, none of these choices arise, because there is only one linear model. And that’s why the binomial GLM is so much easier.
There are two basic cases:
predictors have different values for different values of the outcome, and
parameters are distinct for each value of the outcome.
The first case is useful when each type of event has its own quantitative traits, and you want to estimate the association between those traits and the probability each type of event appears in the data.
The second case is useful when you are interested instead in features of some entity that produces each event, whatever type it turns out to be.
example: predictors matched to outcomes
You are modeling career choice for young adults. One predictor of choice is income.
# simulate career choices among 500 individualsN <-500# number of individualsincome <-c(1,2,5) # expected income of each careerscore <-0.5*income # scores for each career, based on income# next line converts scores to probabilitiesp <- rethinking:::softmax(score[1],score[2],score[3])# now simulate choice# outcome career holds event type values, not countscareer <-rep(NA,N) # empty vector of choices for each individual# sample chosen career for each individualset.seed(34302)for ( i in1:N ) career[i] <-sample( 1:3 , size=1 , prob=p )
# put them in a tibbled <-tibble(career = career) %>%mutate(career_income =ifelse(career ==3, 5, career))# plot d %>%ggplot(aes(x = career)) +geom_bar(linewidth =0)
We have to choose a reference category. For this example, we’ll choose career 3. Let’s fit an intercept-only model to start. The next question is: what are our priors?
get_prior(data = d, family =categorical(link = logit, refcat =3), career ~1)
Family: categorical
Links: mu1 = logit; mu2 = logit
Formula: career ~ 1
mu1 ~ a1
mu2 ~ a2
a1 ~ 1
a2 ~ 1
Data: d (Number of observations: 500)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
a1_Intercept -2.01 0.16 -2.33 -1.71 1.00 3390 2673
a2_Intercept -1.53 0.12 -1.77 -1.29 1.00 2964 2810
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Now we can add income. Remember, in this model, the predictor is matched to the outcome (that is, the value is the same for all observations with that outcome). So rather than using a variable from our dataset, we’ll just input the value directly into our model.
m4 <-brm(data = d, family =categorical(link = logit, refcat =3),bf(career ~1,nlf(mu1 ~ a1 + b *1),nlf(mu2 ~ a2 + b *2), a1 + a2 + b ~1),prior =c(prior(normal(0, 1), class = b, nlpar = a1),prior(normal(0, 1), class = b, nlpar = a2),prior(normal(0, 0.5), class = b, nlpar = b, lb =0)),iter =2000, warmup =1000, cores =4, chains =4,seed =11,control =list(adapt_delta = .99),file =here("files/models/61.4"))
m4
Family: categorical
Links: mu1 = logit; mu2 = logit
Formula: career ~ 1
mu1 ~ a1 + b * 1
mu2 ~ a2 + b * 2
a1 ~ 1
a2 ~ 1
b ~ 1
Data: d (Number of observations: 500)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
a1_Intercept -2.15 0.19 -2.59 -1.81 1.00 777 1018
a2_Intercept -1.81 0.28 -2.52 -1.41 1.01 750 706
b_Intercept 0.15 0.13 0.01 0.49 1.01 605 760
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
To interpret this coefficient, we can use counterfactual simulation. The question is, “How much more likely would someone be to choose Career 2 over Career 1 if the salary of Career 2 were twice what it is now?”
post =as_draws_df(m4) s1 = post$b_a1_Intercept + post$b_b_Intercept *1s2_orig = post$b_a2_Intercept + post$b_b_Intercept *2s2_new = post$b_a2_Intercept + post$b_b_Intercept *2*2# use the softmax link function to create probsp_orig =sapply(1:nrow(post), function(i) rethinking:::softmax( s1[i], s2_orig[i], 0 ))p_new =sapply(1:nrow(post), function(i) rethinking:::softmax( s1[i], s2_new[i], 0 ))p_diff = p_new[2, ] - p_orig[2, ]rethinking::precis(p_diff)
Suppose you are still modeling career choice. But now you want to estimate the association between each person’s family income and which career they choose. So the predictor variable must have the same value in each linear model, for each row in the data. But now there is a unique parameter multiplying it in each linear model. This provides an estimate of the impact of family income on choice, for each type of career.
n <-500set.seed(11)# simulate family incomes for each individualfamily_income <-runif(n)# assign a unique coefficient for each type of eventb <-c(-2, 0, 2)career <-rep(NA, n) # empty vector of choices for each individualfor (i in1:n) { score <-0.5* (1:3) + b * family_income[i] p <- rethinking:::softmax(score[1], score[2], score[3]) career[i] <-sample(1:3, size =1, prob = p)}d <-tibble(career = career) %>%mutate(family_income = family_income)